Generative AI for computer vision
See:
Resources
- Text to video model arena
- How Far is Video Generation from World Model: A Physical Law Perspective
- AI Image Generators Compared Side-By-Side Reveals Stark Differences
Models
- Mochi (Genmo) - an open source state-of-the-art video generation model and is released
- Infinite AI Artboard - Recraft
- Midjourney
- DALLE (OpenAI)
- IMAGEN (Google)
- IMAGEN video
- Stable Diffusion
- Make-A-Video
- Leonardo.AI
Code
References
- #PAPER Video Pixel Networks (Kalchbrenner 2016)
- #PAPER Pixel RNNs - Pixel Recurrent Neural Networks (van den Oord 2016)
- Pixel-RNN presents a novel architecture with recurrent layers and residual connections that predicts pixels across the vertical and horizontal axes. The architecture models the joint distribution of pixels as a product of conditional distributions of horizontal and diagonal pixels. The model achieves state-of-the-art in the generation of natural images.
- https://medium.com/a-paper-a-day-will-have-you-screaming-hurray/day-4-pixel-recurrent-neural-networks-1b3201d8932d
- https://christineai.blog/pixelcnn-and-pixelrnn/
- #PAPER Conditional Image Generation with PixelCNN Decoders (van den Oord 2016)
- #PAPER PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications (Salimans 2017)
- #PAPER FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models (Grathwohl 2018)
- #PAPER Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks (Dupont 2018)
- Using G and D we want to generate realistic images conditioned on a set of known pixels
- Total loss is a combination of a Prior loss (high score of generated images from D) and a Context loss (generated image should match the known pxs)
- For the Context loss, a mask is used with smoothing
- #PAPER Parametric generation of conditional geological realizations using generative neural networks (Chan 2019)
- #PAPER Parametrization of Stochastic Inputs Using Generative Adversarial Networks With Application in Geology (Chan 2020)
- #PAPER Generative Models as Distributions of Functions (Dupont 2021)
- Generative models are typically trained on grid-like data such as images (tied to the underlying grid resolution)
- Instead of discretized grids, they parametrized individual data points by continuous functions over which they learned distributions --> generative models
- Coordinate and feature pairs are treated as point clouds (sets with underlying notion of distance). Leveraged the PointConv framekwork
- Their model can learn rich distributions of functions independently of data type and resolution. Application to AI/Computer Vision/Super-resolution
- #PAPER Score-Based Generative Modeling through Stochastic Differential Equations (Song 2021)
- #PAPER Florence: A New Foundation Model for Computer Vision (Yuan 2021)
- #PAPER Diverse Generation from a Single Video Made Possible (Haim 2021)
- #PAPER Scaling Autoregressive Models for Content-Rich Text-to-Image Generation (Yu 2022)
- #CODE https://github.com/google-research/parti
- https://parti.research.google/
- Pathways Autoregressive Text-to-Image model (Parti), an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge
- #PAPER Autoregressive Image Generation using Residual Quantization (Lee 2022)
- #PAPER MultiMAE: Multi-modal Multi-task Masked Autoencoders (Bachman 2022)
- #PAPER InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions (Wang 2022)
- #PAPER GenAI Arena: An Open Evaluation Platform for Generative Models (2024)